## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.1
## ✓ tidyr 1.1.1 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
##
## Attaching package: 'ggmap'
## The following object is masked from 'package:plotly':
##
## wind
# Get the Data
individuals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-23/individuals.csv')
locations <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-23/locations.csv')
# Or read in with tidytuesdayR package (https://github.com/thebioengineer/tidytuesdayR)
# Either ISO-8601 date or year/week works!
# Install via devtools::install_github("thebioengineer/tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2020-06-23')
##
## Downloading file 1 of 2: `locations.csv`
## Downloading file 2 of 2: `individuals.csv`
tuesdata <- tidytuesdayR::tt_load(2020, week = 26)
##
## Downloading file 1 of 2: `locations.csv`
## Downloading file 2 of 2: `individuals.csv`
individuals <- tuesdata$individuals
This assignment is for ETC5521 Assignment 1 by Team taipan comprising of Helen Evangelina and Yiwen Jiang.
This report presents the findings of the woodland caribou between 1988 to 2016 following the tracking data conducted under B.C. Ministry of Environment & Climate Change. This report mainly analyses the changes in the number of woodland caribou, and other analyses include the habitats changes caused by seasonal differences, the effects of the implementation of management plans and the causes of tag deployment ended. In the following section, we will describe the data set, where the data came from, and what is the data prepared for. The data description also includes how we transform and clean the raw data for analysis. Our statistical programming used for analysis is R and Rstudio.
Caribou are the only large herbivore that is widely distributed in the high-elevation habitat and act as agents for plant and lichen diversity through the mechanisms of trampling and foraging. The Caribou has also been a significant resource for indigenous peoples for millennia. (BC Ministry of Environment (2014)). The survival rate of the Caribou is generally relatively low due to predation by Canis Lupus (wolf). The Caribou listed as “vulnerable” on the International Union for Conservation of Nature (IUCN) Red List. With the Caribou being listed as “Threatened”, it is essential to monitor the number of the Caribou as monitoring is vital to effective conservation. We will represent our findings in this report through the exploration of the Caribou tracking data.
The tracking data was collected by B.C. Ministry of Environment & Climate Change over 28 years (1988 - 2016), the data was prepared for the study of management and recovery of the caribou. It includes the information of 286 Caribou and covered 250,000 locations.
vis_miss(individuals)
After we read the data, we observe that in the individuals data over half of the values are missing (Refer to Figure 1). Most of the element cannot be analysed because of large proportion NA in most of the variables, such as, in the pregnant variable, there are 93.36% of the values missing.
The Caribou has a low reproductive rate due to females only have one calf per year, and females do not reproduce until they are two years old. To analysis, the sex ratio should be a good indicator of the trend of the number of Caribou. However, there are only five males Caribou out of 286. The analysis result will exist bias when we use the sex ratio as an indicator.
The dataset tracks woodland caribou in northern British Columbia published by the Movebank Data Repository at https://www.datarepository.movebank.org/handle/10255/move.955. This data was collected by putting trackers of almost 250,000 location tags on 260 caribou, from 1988 to 2016, which was accessed through Movebank.
The boreal woodland caribou (Rangifer tarandus caribou), also known as woodland caribou, boreal forest caribou and forest-dwelling caribou, is a North American subspecies of the reindeer (or the caribou in North America) with the vast majority of animals in Canada. They prefer lichen-rich mature forests and mainly live in marshes, bogs, lakes and river regions. Caribou are considered as an ancient member of the deer family Cervidae (Banfield, 1974). They are smaller than Moose (Alces americanus) and Elk (Cervus canadensis), standing 1.0–1.2 m high at the shoulder (Thomas and Gray, 2002). Due to the caribou is classified as “Vulnerable” on the International Union for the Conservation of Nature’s (IUCN) Red List. The data provided for the study of the B.C. Ministry of Environment & Climate Change to report the management and recovery of the caribou.
Because this data set is used for analysing the reproduction of species, the data is obtained by observation rather than experiment. There is no treatment group and the control group. The time frame of collection was started from 1988 and end of 2016. Movebank captures the locations of individual animals over time by tracking the bio-logging sensors attached to animals (Kranstauber et al., 2011). The data sets were separated into two data files and provided by .csv format. The following are the variables in each data.
individual data comes from Mountain caribou in British Columbia-reference-data.csv. The data contains the relevant information of 286 caribou. The variables are showing in the following table:| Variable | Class | Description |
|---|---|---|
| animal_id | character | Individual identifier for animal |
| sex | character | Sex of animal |
| life_stage | character | Age class (in years) at beginning of deployment |
| pregnant | logical | Whether animal was pregnant at beginning of deployment |
| with_calf | logical | Whether animal had a calf at time of deployment |
| death_cause | character | Cause of death |
| study_site | character | Deployment site or colony, or a location-related group such as the herd or pack name |
| deploy_on_longitude | double | Longitude where animal was released at beginning of deployment |
| deploy_on_latitude | double | Latitude where animal was released at beginning of deployment |
| deploy_on_comments | character | Additional information about tag deployment |
| deploy_off_longitude | double | Longitude where deployment ended |
| deploy_off_latitude | double | Latitude where deployment ended |
| deploy_off_type | character | Classification of tag deployment end (see table below for full description |
| deploy_off_comments | character | Additional information about tag deployment end |
location comes from Mountain caribou in British Columbia-gps.csv. The data contains location information of each counted caribous for every 4 fours.| Variable | Class | Description |
|---|---|---|
| event_id | double | Identifier for an individual measurement |
| animal_id | character | Individual identifier for animal |
| study_site | character | Deployment site or colony, or a location-related group such as the herd or pack name |
| season | character | Season (Summer/Winter) at time of measurement |
| timestamp | datetime | Date and time of measurement |
| longitude | double | Longitude of measurement |
| latitude | double | Latitude of measurement |
The data being used is the dataset from the Science update for the South Peace Northern Caribou (Rangifer tarandus caribou pop. 15) in British Columbia available from Movebank (BC Ministry of Environment, 2014). The raw datasets are first read by using read_csv() function. It can be noticed from the raw datasets that the variable names use “-“ instead of “_”. Using dash in a variable name might result to issues, as the valid variable name in R should consist of dot or underline characters. Another problem from this dataset is the values in the “animal-life-stage” consist of spacing, which might lead to issues as it is inconsistent. Another noticeable thing is the datasets have a lot of NA values. Therefore, the data needs to be cleaned by using the tidyverse and janitor libraries.
To clean the individuals data, firstly clean_names() function from the janitor package is used to return the data.frame with clean names. What this function does is changing the variable names into a tidier form. As mentioned before, using dash in variable names is not appropriate in R. Notice that the raw dataset has names like “deploy-off-latitude” which is changed into “deploy_off_latitude”. Next is to assigned the result to transmute(), which will compute new columns but will drop existing columns. This is done to make the variable names in a tidier way. The whitespace in the life stage is gotten rid to address inconsistent spacing by using str_remove_all() function. After tidying the variable names with transmute, the “reproductive_condition” variable is separated into “pregnant” and “with_calf” by using the separate() function as this variable actually contains two dimensions, and then assigning those variables into new columns by using the mutate() function which consists of either TRUE or FALSE value.
The locations data is cleaned by using the same method as the individuals data, which includes cleaning the name first by using clean_names() function to arrive to a data.frame with clean names. The next step is to use transmute() function to compute new columns with dropping existing columns. After cleaning both datasets, the final datasets are written into csv format by using write_csv() function.
# Load libraries
library(tidyverse)
library(janitor)
# Import data
individuals_raw <- read_csv("./caribou-location-tracking/raw/Mountain caribou in British Columbia-reference-data.csv")
locations_raw <- read_csv("./caribou-location-tracking/raw/Mountain caribou in British Columbia-gps.csv")
# Clean individuals
individuals <- individuals_raw %>%
clean_names() %>%
transmute(animal_id,
sex = animal_sex,
# Getting rid of whitespace to address inconsistent spacing
# NOTE: life stage is as of the beginning of deployment
life_stage = str_remove_all(animal_life_stage, " "),
reproductive_condition = animal_reproductive_condition,
# Cause of death "cod" is embedded in a comment field
death_cause = str_remove(animal_death_comments, ".*cod "),
study_site,
deploy_on_longitude,
deploy_on_latitude,
# Renaming to maintain consistency "deploy_on_FIELD" and "deploy_off_FIELD"
deploy_on_comments = deployment_comments,
deploy_off_longitude,
deploy_off_latitude,
deploy_off_type = deployment_end_type,
deploy_off_comments = deployment_end_comments) %>%
# reproductive_condition actually has two dimensions
separate(reproductive_condition, into = c("pregnant", "with_calf"), sep = ";", fill = "left") %>%
mutate(pregnant = str_remove(pregnant, "pregnant: ?"),
with_calf = str_remove(with_calf, "with calf: ?")) %>%
# TRUE and FALSE are indicated by Yes/No or Y/N
mutate_at(vars(pregnant:with_calf), ~ case_when(str_detect(., "Y") ~ TRUE,
str_detect(., "N") ~ FALSE,
TRUE ~ NA))
# Clean locations
locations <- locations_raw %>%
clean_names() %>%
transmute(event_id,
animal_id = individual_local_identifier,
study_site = comments,
season = study_specific_measurement,
timestamp,
longitude = location_long,
latitude = location_lat)
# Write to CSV
write_csv(individuals, "./caribou-location-tracking/individuals.csv")
write_csv(locations, "./caribou-location-tracking/locations.csv")
This dataset is primarily used to analyse the changes in the number of caribou from 1988 to 2016 to observe the survival of the species. As the management came up with a plan, we would like to analyse whether the management plan is effective in increasing the number of caribou over time.
The primary question to answer from this dataset is how is the trend of the number of caribou over time?
From the primary question, we came up with four secondary questions, which are as follows: - Does the habitats vary between summer and winter?
- How is the trend of the classification of tag deployment end (deploy_off_type)?
- Has the management plan increased the number of caribou?
caribou_trend <- locations %>%
separate(timestamp, c("date", "time"), sep = " ") %>%
mutate(month = month(date), year = year(date)) %>%
group_by(animal_id, study_site, month, year) %>%
summarise(n = 1) %>%
group_by(month, year) %>%
summarise(count = sum(n)) %>%
mutate(date = as.Date(paste(year, as.numeric(month), "01", sep="-"),
format = "%Y-%m-%d"))
## `summarise()` regrouping output by 'animal_id', 'study_site', 'month' (override with `.groups` argument)
## `summarise()` regrouping output by 'month' (override with `.groups` argument)
trend_plot <- ggplot(caribou_trend, aes(x = date, y = count)) +
geom_line() +
xlab("") +
ylab("Number of Caribou been tracked") +
theme_bw()
ggplotly(trend_plot)
Monthly number of Caribou been tracked between 1988 to 2016
[FILL] Should include at least one plot or numerical summary for each of your questions, that helps the reader arrive at an answer. You should also write paragraphs describing the methods, summaries and findings.
# get map data
caribou_map <- get_map(location = c(-125, 52.5, -119, 57.6), source = "osm")
## Source : http://tile.stamen.com/terrain/7/19/38.png
## Source : http://tile.stamen.com/terrain/7/20/38.png
## Source : http://tile.stamen.com/terrain/7/21/38.png
## Source : http://tile.stamen.com/terrain/7/19/39.png
## Source : http://tile.stamen.com/terrain/7/20/39.png
## Source : http://tile.stamen.com/terrain/7/21/39.png
## Source : http://tile.stamen.com/terrain/7/19/40.png
## Source : http://tile.stamen.com/terrain/7/20/40.png
## Source : http://tile.stamen.com/terrain/7/21/40.png
## Source : http://tile.stamen.com/terrain/7/19/41.png
## Source : http://tile.stamen.com/terrain/7/20/41.png
## Source : http://tile.stamen.com/terrain/7/21/41.png
ggmap(caribou_map) +
geom_point(data = locations,
aes(x = longitude, y = latitude, color = season),
alpha = 0.5, size = 0.5) +
theme_void() +
theme(legend.position = "none",
panel.grid = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())